7 research outputs found

    An architecture for a focused trend parallel web crawler with the application of clickstream analysis

    Get PDF
    The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric

    Sentiment Mining on Products Features based on Part of Speech Tagging Approach

    Get PDF
    Abstract In today's competitive business, paying attention to the feedback from customers has become a valuable factor for organizations. Organizations have found that satisfied customers are not only a repeated buyer, they are also propaganda arm of the organization. Therefore, the correct analysis of their feedback by relying on information technology tools is a key element in the success of the organizations in trade. People generally share their opinions about purchased goods on the Web sites or in social networks. Extraction of these opinions is known as a special branch of text mining under the term of sentiment mining. Although this category is brand new, but in recent years, extensive researches have been done on sentiment analysis and classification of intentions. Therefore, in this paper a model is suggested about sentiment mining with the ability to extract users' opinion and product features. So dataset of customer comments has been made in a way that the comments are taken from a Website about some specific digital products. Then the paragraphed opinions are converted into sentences and the sentences are separated into two categories of subjective and objective. Next, user's opinion and product features are taken from subjective sentences by using StanfordPOStagger and relying on Tf-idf factor for product features and finding opinion polarity by using SentiWordNet tools. In this way, user satisfaction of specific features of the product can be detected. As a means of evaluation, three factors of Recall, Precision and F-Measure provide an indication of the accuracy of each part of this research

    Parallel web crawler architecture for clickstream analysis

    Get PDF
    The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the barriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to be addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries

    Architecture for a parallel focused crawler for clickstream analysis

    No full text
    The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s)

    Hybrid Machine Learning-Based Approaches for Feature and Overfitting Reduction to Model Intrusion Patterns

    No full text
    An intrusion detection system (IDS), whether as a device or software-based agent, plays a significant role in networks and systems security by continuously monitoring traffic behaviour to detect malicious activities. The literature includes IDSs that leverage models trained to detect known attack behaviours. However, such models suffer from low accuracy or high overfitting. This work aims to enhance the performance of the IDS by making a model based on the observed traffic via applying different single and ensemble classifiers and lowering the classifier’s overfitting on a reduced set of features. We implement various feature reduction techniques, including Linear Regression, LASSO, Random Forest, Boruta, and autoencoders on the CSE-CIC-IDS2018 dataset to provide a training set for classifiers, including Decision Tree, Naïve Bayes, neural networks, Random Forest, and XGBoost. Our experiments show that the Decision Tree classifier on autoencoders-based reduced sets of features yields the lowest overfitting among other combinations
    corecore